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Pattern Classification In Symbolic Streams 
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Abstract — We propose a technique for pattern classification 
in symbolic streams via selective erasure of observed symbols, 
in cases where the patterns of interest are represented as 
Probabilistic Finite State Automata (PFSA). We define an additive 
abelian group for a slightly restricted subset of probabilistic finite 
state automata (PFSA), and the group sum is used to formulate 
pattern-specific semantic annihilators. The annihilators attempt 
to identify pre-specified patterns via removal of essentially 
all inter-symbol correlations from observed sequences, thereby 
turning them into symbolic white noise. Thus a perfect annihi- 
lation corresponds to a perfect pattern match. This approach 
of classification via information annihilation is shown to be 
strictly advantageous, with theoretical guarantees, for a large 
class of PFSA models. The results are supported by simulation 
experiments. 

Index Terms — Probabilistic Finite State Machines, Machine 
Learning, Pattern Classification 



I. Introduction and Motivation 

The principal focus of this work is the development of 
an efficient algorithm for identifying pre-specified patterns of 
interest in observed symbolic data streams, where the patterns 
are represented as Probabilistic Finite State Automata (PFSA) 
over pre-defined symbolic alphabets. 

A finite state automaton (FSA) is essentially a finite graph 
where the nodes are known as states and the edges are 
known as transitions, which are labeled with letters from an 
alphabet. A string or a symbol string generated by a FSA 
is a sequence of symbols belonging to an alphabet, which 
are generated by stepping through a series of transitions 
in the graph. Probabilistic finite state automata, considered 
in this paper, are finite state machines with probabilities 
associated with the transitions. PFSA have extensively studied 
as an efficient framework for learning the causal structure 
of observed dynamical behavior [1|. This is an example of 
inductive inference [2|, defined as the process of hypothesizing 
a general rule from examples. In this paper, we are concerned 
with the special case, where the inferred general rule takes 
the form of a PFSA, and the examples are drawn from a 
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stochastic regular language. Conceptually, in such scenarios, 
one is trying to learn the structure inside of some black box, 
which is continuously emitting symbols 0]. The system of 
interest may emit a continuous valued signal; which must be 
then adequately partitioned to yield a symbolic stream. Note 
that such partitioning is merely quantization and not data- 
labeling, and several approaches for efficient symbolization 
have been reported J3_). 

Probabilistic automata are more general compared to their 
non-probabilistic counterparts Q, and are more suited to 
modeling stochastic dynamics. It is important to distinguish 
between the PFSA models considered in this paper, and the 
ones considered by Paz [5], and in the detailed recent review 
by Vidal etal. (6). In the latter framework, symbol generation 
probabilities are not specified, and we have a distribution 
over the possible end states, for a given initial state and an 
observed symbol. In the models considered in this paper, 
symbol generation is probabilistic, but the end state for a given 
initial state, and a generated symbol is unique. Unfortunately, 
authors have referred to both these formalisms as probabilistic 
finite state automata in the literature. The work presented here 
specifically considers the latter modeling paradigm considered 
and formalized in Q, 10, @, 0. 

The case for using PFSA as pattern classification tools is 
compelling. Finite automata are simple, and the sample and 
time complexity required for learning them can be easily 
characterized. This yields significant computational advan- 
tage in time constrained applications, over more expressive 
frameworks such as belief (Bayesian) networks iflOl , ifTTl or 
Probabilistic Context Free Grammars (PCFG) Q3, El (also 
see |[T4ll for a general approach to identifying PCFGs from 
observations) and hidden Markov models (HMMs) lfl5ll . Also, 
from a computational viewpoint, it is possible to come up with 
provably efficient algorithms to optimally learn PFSA, whereas 
"optimally learning HMMs is often hard" Q). Furthermore, 
most reported work on HMMs fl5l . fl6l . ifTTll assumes the 
model structure or topology is specified in advance, and the 
learning procedure is merely training, i.e., finding the right 
transition probabilities. For PFSA based analysis, researchers 
have investigated the more general problem of learning the 
model topology, as well as the transition probabilities, which 
implies that such analysis can then be applied to domains 
where there is no prior knowledge as to what the correct 
structure might look like |[T8ll . 

Although the reported PFSA construction algorithms JU, 
0, [l] {referred to as the direct compression algorithms in 
the sequel) are asymptotically efficient, time critical applica- 
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tions (e.g. pattern classification in sensing and surveillance 
networks) often demand faster identification to what the state 
of the art can provide. This motivates the key problem inves- 
tigated in this paper: 

Given a set ofPFSA models representing patterns of interest 
(i.e. a PFSA based pattern dictionary or library), the problem 
is to identify (in real or near-real time) if any of the specified 
patterns of interest exist in an observed symbol sequence, 
without resorting to direct compression and subsequent com- 
parison of the constructed PFSA model against the library 
elements. 

We propose a novel classification technique based on se- 
lective erasure of observed symbols leading to perfect infor- 
mation annihilation as illustrated in Figure [TJ Specifically, we 
construct an additive abelian group over a slightly restricted 
subset of all PFSA (over a fixed alphabet), and show that 
it is possible to define pattern-specific semantic annihilators 
as a function of the group inverses. These annihilators can 
then operate on the sensed stream in a symbol-by-symbol 
fashion attempting to eliminate all inter-symbol correlations. 
The annihilation is shown to be perfect if and only if the 
annihilator corresponds exactly to the inverted PFSA model 
of the underlying generating process. Thus we need to only 
check if the annihilated stream (corresponding to a particular 
PFSA) is free from any emergent pattern, i.e., if the symbols 
are equi-probable in an history-independent manner (denoted 
as symbolic white noise in the sequel) to infer the existence 
of that pattern in the original observed stream. 

The proposed approach is computationally efficient for di- 
rect compression of the symbol stream, since it is significantly 
easier to check if a symbolic stream is in fact symbolic white 
because the underlying PFSA model has a single state with 
equal symbol generation probabilities as seen in Figure IZbl and 
l2cl It is also shown that the proposed technique is provably 
faster if the cardinality of the alphabet is not greater than 
the number of states in a particular pattern of interest, which 
represents almost all PFSA models encountered in practice. 

The rest of the paper is organized in additional ten sections. 
Section [II] is a brief overview of preliminary concepts, and 
related work. Section [III] presents the construction of the 
additive abelian group for probability measures on symbol 
strings which is then shown to induce an abelian group on 
a restricted set of PFSA. Section [IV] develops a practical 
implementation of the PFSA sum which is then used to 
formulate the notion of the semantic annihilators in SectionlVl 



Section [VI] identifies the theoretical conditions under which 
we can guarantee classification via semantic annihilation to 
be faster than direct compression. Section IVIIII establishes 
asymptotic bounds on the run-time complexity of annihilators. 
Simulation results are presented in Section IIXI and pertinent 
discussions, intuitive interpretations, and potential applications 
are delineated in Section [X] The paper is concluded in Sec- 
tion [XI] with recommendations for future work. 

II. Preliminary Concepts and Related Work 

A string x over an alphabet (i.e. a non-empty finite set) E 
is a finite-length string of symbols in E [19]. The length of 
a string x is the number of symbols in x and is denoted by 
|a;|. The Kleene closure of E, denoted by E*, is the set of all 
finite-length strings of symbols including the null string e. The 
set of all strictly infinite-length strings of symbols is denoted 
as E". The string xy is the concatenation of strings x and 
y. Therefore, the null string e is the identity element of the 
concatenative monoid. 

Definition 1 (PFSA): A probabilistic finite state automaton 
(PFSA) is a tuple G = (Q, E, 5, q , II), where Q is a 
(nonempty) finite set, called the set of states; E is a (nonempty) 
finite set, called the input alphabet; 6 : Q x E — > Q is the state 
transition function; qo G Q is the start state; il : Q x E — > 
[0, 1] is an output mapping, known as the probability morph 
function that specifies the state-specific symbol generation 
probabilities, and satisfies \/qi G Q,cr € E,n(gi, a) ^ 0, and 
EreEn(ft,T) = 1. 

Notation 1: In the sequel, we would often use a matrix 
representation II (denoted as the morph matrix) of the morph 
function, with the ij th element given by H(qi, uf). Note, that 
il is, in general, a rectangular non-negative matrix with row 
sums equal to unity. Also, from a knowledge of the morph 
matrix II, and the transition map S, one can compute the 
stochastic state transition matrix il, as: 



n, 



E fi (®.°*) 



(1) 



a k :8(q i ,tT k )=q j 



Note that il is a square non-negative stochastic matrix. 

Notation 2: The transition map S naturally induces an ex- 
tended transition function 6* : Q x E* — > Q such that 

8*(q, e) = q and S* (q, it) = S(S*(q, x), r) for q G Q, x G E* 
and t G E. 

We assume that the underlying graph for the PFSA models 
considered in this paper is irreducible, i.e., is strongly con- 
nected. This implies that the transition probability matrix il 
is an irreducible stochastic matrix, and in particular, has an 
unique stationary distribution |20| irrespective of the the initial 
distribution. This assumption is motivated by the association 
of PFSA with emerging patterns in statistically stationary sym- 
bolic streams, because it makes little sense to represent such 
dynamical systems with models whose stationary behavior 
would depend on the initial state. Furthermore, the theoretical 
development in the sequel, necessitates this assumption for 
technical reasons. 

Notation 3: In the sequel, we denote the PFSA constructed 
by directly compressing a symbol string to G E* as C(w). 
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The specific algorithm used is not important for the analysis 
presented in this paper. 

Definition 2 (a-Algebra): A collection 971 of subsets of a 
non-empty set X is said to be a cr-algebra [21] in X if 97t has 
the following properties: 

1) X e 971 

2) If A £ 97t, then A c £ 971 where A c is the complement 
of A relative to X, i.e., A c = X \ A 

3) If A = U,T = i and if A„ 6 9JI for n £ N, then 
Ael. 

Definition 3 (Measure): A (non-negative) measure is a 
countably additive function \i, defined on a cr-algebra 97t, 
whose range is [0, oo]. Countable additivity means that if {Ai} 
is a pairwise disjoint countable collection of members of 971, 
then a* (U£i>»*) = ££i /*(>!*) 

Definition 4 (Probability Measure): A probability measure 
on a non-empty set with a specified cr-algebra EOT is a finite 
(non-negative) measure on 97t. Although not required by the 
theory, a probability measure is defined to have the unit 
interval [0, 1] as its range. 

Definition 5 (Measure Space): A probability measure 
space is a triple (X, 971, p) where X is a non-empty set, 971 
is a cr-algebra in X, and p is a finite non-negative measure 
on 971. 

Definition 6 (a -Algebra for Symbolic Strings): Given an 
alphabet E, the set 93s — 2 s * E w is defined to be the cr- 
algebra generated by the set {L : L = xE w where x £ £*}, 
i.e., the smallest cr-algebra on the set which contains the 
set {L : L — xE w s.t. x 6 E*}. 

For brevity, the probability p(xE w ) is denoted as p(x), Vx £ 
E* in the sequel. In other words, p(x) is the probability of the 
occurrence of all the strings with a; as a prefix. 

Definition 7 (Probabilistic Nerode Relation): Given an al- 
phabet £, any two strings x,y £ E* are said to satisfy 
the probabilistic Nerode relation Af p on a probability space 
(E w ,93s,p), denoted by xM p y, if either of the following 
conditions is true: 

1) p(x) = p(y) = 0; 

2) Va £ £*, £gi = £gl provided that p(x) ^ 0,p(») # 

It has been proved in [9 | that the probabilistic Nerode relation 
defined above is a right-invariant equivalence relation lfl9l 
which means that if two strings x, y are equivalent, so are 
any right extensions of the strings, i.e., 

Vx, y,u € £*, xM v y => xuM p yu 

In the sequel, this is referred to as probabilistic Nerode 
equivalence and we denote the Nerode equivalence class of a 
string x on E* by [x] p , i.e., [x] p — {z £ E* : xA/" p z}. The right 
invariance property induces the notion of states and hence is 
crucial to the definition of probabilistic state machines; by this 
property two equivalent strings have probabilistically indistin- 
guishable future evolution and therefore can be visualized as 
terminating on the same state as seen in Figure [2a] In this 
context, we make the following observation: 




Fig. 2: Linguistic Concepts: (a) Concept of PFSA states 
from Probabilistic Nerode Equivalence: Nerode equivalent 
strings u)i,u)2 have probabilistically indistinguishable future 
evolution, thus leading to the same state q x . (b) Symbolic 
White Noise (See Eq. (O for formal definition) for alphabet 
E = {cr , cti, cr 2 }; (c) for alphabet E = {cr ,cri} 

A symbolic dynamical process has a probabilistic fi- 
nite state description if and only if the corresponding 
Nerode equivalence has a finite index. 
Definition 8 (Space of PFSA): The space of all PFSA over 
a given symbol alphabet is denoted by g/ and the space of all 
probability measures p that induce a finite-index probabilis- 
tic Nerode equivalence on the corresponding measure space 
(E w ,93 s ,p) is denoted by 

As expected, there is a close relationship between srf and J 2 , 
which is made explicit in the sequel. 

Definition 9 (PFSA Map H): Let p £ and G = 

(Q, E, <5, q , n) £ £/. The map H : -> 9> is defined as 
H(G) = p such that the following condition is satisfied: 

Vx = a i ■ ■ ■ oy £ £*, 

r-l 

p(x) = n^ojCi) J} n(5*(c/ ,CTi • ■■a k ),a k+ i) 

k=l 

where r £ N, the set of positive integers. 
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Definition 10 (Right Inverse H_J: The right inverse of the 
map H is denoted by H_i : @* — > sii such that 

Vp e H(H_i(p)) = P 

An explicit construction of the map H_i is reported in |9] 
and is not presented in this paper, because we only require 
that such a map exists. 

Definition 11 (Perfect Encoding): Given an alphabet S, a 
PFSA G = (Q, E, 5, qo, II) is said to be a perfect encoding of 
the measure space (E", =5^£,p) if p = H(G). 

There are possibly many PFSA realizations that encode 
the same probability measure on 3$y, due to existence of 
non-minimal realizations and state relabeling; neither of them 
affect the underlying encoded measure. From this perspective, 
a notion of PFSA equivalence is introduced as follows: 

Definition 12 (PFSA Equivalence): Two PFSA G\ and G2 
are defined to be equivalent if H(Gi) = H(G 2 ). In this case, 
we say G\ = G2. 

Remark 1: In the sequel, a PFSA G implies the equivalence 
class of G, i.e., {Pes/: H(P) = H(G)}. This concept is 
similar to the equivalence class of almost everywhere equal 
functions being a unique vector in the i r -space [21 1. 

Definition 13 (Structural Equivalence): Two PFSA Gi = 
(Qi, E, Si, ql, II,) G i = 1,2, are defined to have the 
equivalent (or identical) structure if Q\ = Q 2 ,<7o = 9o an d 
<5i(g, a) = 5 2 (q,<j),Vq G Qi Ver G S. 

Definition 14 (Synchronous Composition of PFSA): The 
binary operation of synchronous composition of two PFSA 
Gi = (Qi, E, 5, qjj , Ilj) 6 ^ where i = 1,2, denoted by 
® : x ^ — > £/ is defined as 



G X ®G 2 = (Qx xQ 2 ,S ! 5' ! (^ 1) , g f),n' 
where 6' and II' is computed as follows: 



Vtfc e Q\,qj e Q 2 ,a e E, 
5\{qi,qj),a) = (5i(qi,a) ) 8 2 (qj > o-)) 
n'((g 4 ,^),cr) = Ilife.cr) 



Remark 2: In general, the operation ® of synchronous 
composition is non-commutative. 

Proposition 1 (Synchronous Composition of PFSA): Let 
Gi,G 2 e ^T. Then, H(Gi) = H(Gi ® G 2 ) and therefore 
Gi = G\ ® G 2 in the sense of Definition Q~2] 

Proof: See Theorem 4.5 in (5). ■ 

Synchronous composition of PFSA allows transformation 
of PFSA with disparate structures to non-minimal descriptions 
that have the same underlying graphs. This assertion is crucial 
for the development in the sequel, since any binary operation 
defined for two PFSA with an identical structure can be 
extended to the general case on account of Definition [14] and 
Proposition Q] 

III. Abelian Group of PFSA 

This section shows that a subspace of PFSA can be assigned 
the algebraic structure of an abelian group. We first construct 
the abelian group on a subspace of probability measures, and 



then induce the group structure on this subspace of PFSA via 
the isomorphism between the two spaces. 

Definition 15 (Restricted PFSA Space): Let srf + = {G = 
(Q,E,S,q ,U) : U(q,cr) > Vq G Q Ver G E} that is a 
proper subset of sd '. It follows that the transition map of any 
PFSA in the subset is a total function. We restrict the 
map H : stf — > 3? on a smaller domain gf + , that is, H + : 
£/+ ->• i.e., H+ = H|^+. 

Definition 16 (Restricted Probability Measure): Let 
3? + = {p G 8? : p(x) ± 0,Vr € E*} that is a proper 
subset of Each element of ^ + is a probability measure 
that assigns a non-zero probability to each string on Q5s. 
Similar to Definition Q3J we restrict H_i on i.e., 
= H_i|^+. 

Since we do not distinguish PFSA in the same equivalence 
class (See Definition fT2l) . we have the following result. 

Proposition 2 (Isomorphism of H + ): The map H + is an 
isomorphism between the spaces g/ + and £? >+ , and its inverse 
is Hi x . 

Definition 17 (Abelian Operation on & + ): The addition 
operation © : 0> + x ^+ — > & + is defined by 
pa — Pi © P2 , Vpi , P2 G ^ + such that 

1) p 3 (e) = 1. 



2) Vx G E* and t G E. 



Ps(xt) Pi(xt)p 2 (xt) 



P3('X) Ea e EPlWP2(ia) 

p 3 is a well-defined probability measure on S+ , since 

Vx G E*, 

v / \ v Pi(xt)p 2 (xt) 
E reS p 3 (:z;T) = E reS 



Pl(XT P2\XT 

— - — — - — -p 3 (x) = p 3 (x) 



algebra 



Proposition 3 (abelian Group of PFSA): The 
(£? >+ ,©) forms an abelian group. 

Proof: Closure property and commutativity of (^ + ,©) 
are obvious. The associativity, existence of identity and exis- 
tence of inverse element are established next. 

(1) Associativity i.e. (pi ©p 2 ) ©P3 — p\ © (p 2 ®Pz)- We note, 

that \/x G S*, r G E, 

((pi ® P2) ® Pz)(xt) _ (pi ®p 2 ){xt)p 3 (xt) 
((pi ®P2) ®Ps)(x) "E/jgeOi ®P2){x(3) P3 {xl3) 



Pi(xt)p 2 {xt) 



p 3 (xr) 



Pi(xt)p 2 (xt)p 3 (xt) 
E^ge -Pi ( 2; / 3 )P2 (a;/? )^3 (^) 



Pi(zt) 



P2(^T)p 3 (a;T) 



pi(ccr)(p 2 ®p 3 )(^T) 
E^esPiC^)^ ©P3)(^) 

= (Pi ® (p 2 0P3))(gT) 
(Pi © (P2 ®p 3 )){x) 

(2) Existence of identity: Let us introduce a probability 
measure i of symbol strings such that: 



V.,^E\ i'-'-'^f^ 



(2) 
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where \x\ denotes the length of the string x. Then, Vr G E 
that = |^T. For a measure p G ^+ and Vr G E, 

(pffiioHarr) _ p(xt)'i (xt) _ p(xr) 

(pffii o )0) EaesPC^")^^") 

This implies that p © i = i © p = p by Definition [17] and 
by commutativity. Therefore, i is the identity of the monoid 

(^+ffi). 

(3) Existence of inverse: \/p G Va; G E* and Vr G E, let 
— p be defined by the following relations: 

(-P)W = 1 
(-p)(xr) p- 1 (xr) 



(-P)(x) EaesP 1 ( xa ) 
Then, we have: 

(p © (-p))(xr) p(xr)(-p)(a;r) 



(3) 
(4) 



(5) 



(p©(-p))(a;) £ oeE p(arar)(-p)(a;a) |S| 

This gives p © (— p) = i Q which completes the proof. ■ 
In the sequel, we denote the zero-element i of the abelian 
group (^ + ,ffi) as the symbolic white noise. The concept of 
symbolic white noise has been illustrated in Figure |2b] and [2c] 



The extension to the general case is achieved by using syn- 
chronous composition of probabilistic machines. 

Proposition 5 (PFSA Addition (General case)): Given two 
PFSA Gi,G 2 € the sum Gi+G 2 is computed via 

Proposition |4] and Definition [14] as follows: 



Gi+G 2 = (Gi®G 2 )+(G 2 ®Gi) 



(8) 



Proof: Noting that Gi ® G 2 and G 2 <S> G\ have the same 
structure up to state relabeling, it follows from Proposition Q~| 



H+(Gi+G 2 ) = H+(Gi) ©H+(G 2 ) (See Definition [18 
= H+(Gi ® d) © H+(G 2 9 Gi) 



IT 



(Gi 9 G 2 )+(G 2 ® Gi] 



which completes the proof. ■ 
Example 1: Let G\ and G 2 be two PFSA with identical 
structures, such that the probability morph matrices are: 



Hi 



0.2 0.8 
0.4 0.6 



and IT 2 



0.1 0.9 
0.6 0.4 



(9) 



Then the H-matrix for the sum Gi+G 2 , denoted by ITi 2 , is 

0.1 X 0.2 0.9 X 0.8\ Normalize /0.027 0.973> 



n 



0.6 x 0.4 0.4 x 0.6 



0.5 0.5 



A. Explicit Computation of the abelian Operation © 

The isomorphism between & + and (See Proposition^ 
induces the following abelian operation on . 

Definition 18 (Addition Operation on PFSA): Given any 
Gi, G 2 G the addition operation + : s/ + x 
is defined as: 

Gi + G 2 = Ht^H+CGi) © H+(G 2 )) 

If the summand PFSA have identical structure (i.e., their 
underlying graphs are identical), then the explicit computation 
of this sum is stated as follows. 

Proposition 4 (PFSA Addition): If two PFSA Gi 2 G 2 £ 
£/ + are of the same structure, i.e., Gi = (Q, E, S, qo, Hi),i = 
{1,2}, then we have Gi+G 2 = (Q, E, S, qo, II) where 



L%,ct) 



ril(g,cr)II 2 (q,cr) 
Eae£ni(<?,a)n 2 ((?, a) 



(6) 



Proof: Let p* = H + (Gj), i = {1,2} and since Gi,G 2 
have the same structure, we have from Eq. 



Vcr G E,Vir s.t. 6* (qo, x) = q G Q, 
=fh(6*(qo,x),a) =Ik(q,a) 

Now, by Definition [T7] and Definition [9] 

(Pi ©P2)(a;cr) Pl (xcr)p 2 (xa) 



(7) 



n(«,(r) 



pi (zer)p 2 (^g-) 



ni(q, cr)n 2 ((7, cr) 



E p 1 (xa)p 2 (xa) 



E Q £E n i(9,a)n 2 (<?,a) 



IV. A Machine Representation of PFSA Sum 

In this section, we investigate the implementation of the 
sum of two PFSA by a sequentially controlled interaction 
of individually generated symbol strings, which form the 
conceptual basis of designing a semantic annihilator. Referring 
to Figure [3] we will call this the Plus-machine. 




Output 
Sequence 



Gi+G 2 



Fig. 3: Sum of two PFSA: The Plus-machine ^#(Gi+G 2 ). 
The machines generate symbols independently, but is allowed 
to change states only if the generated symbols match. 



A. Functional Description of the Plus-Machine 

For a given pair of PFSA Gi and G 2 , the Plus-machine 
denoted as as ^#(Gi+G 2 ) has the following components: 

> Copies of the component machines G\ and G 2 . We 
assume, without loss of generality, that G\ and G 2 have 
the same structure (See Definition 114b. since this can be 
always arranged (See Proposition [T]). 
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A logical AND gate AND :ExS^{0,l} which operates 
as follows: 



V<Tj,0j G S, <7i AND o j — 



(false), if Gi ^ aj 

1 (true), otherwise 



B. Operational Description of the Plus-Machine 
The + machine ^(G\+Gi) operates as follows: 

• Each of the component machines, G\ and G2, is initial- 
ized to the same state qo in the underlying graph. 

• Each of the component machines, G\ and G2, operates 
in a statistically independent manner to generate symbols 
from the alphabet E. 

> However, to activate a state transition, the generated 
symbols must be passed through the AND gate, upon 
which they must yield a true output. Formally, 



V<7j,<7j G G Q, 



if (7; AND er 7 - = 



5(q i ,o- i ),5{q j ,a j ) 
otherwise 

* The machine is assumed to function inside a "black box", 
with an external observer. The observable output string 
generated as follows: A generated symbol is observable 
if and only if it causes a sMargtate transition. 
The sequential functioning of ./£ (G1+G2) is illustrated in 
Figure [3] We have the following result: 

Proposition 6 (Semantic Compression): For a given pair of 
PFSA Gi,G 2 , if the output string from the ^(Gi+G 2 ) 
is denoted as x G then, the PFSA <D(x) obtained by 
semantically compressing x is given by the sum G1+G2. 

Proof: It follows from the functional description, and the 
following considerations: 

1) The component machines Gi and G2 are always state 
synchronized (follows from operational description). 

2) The components generate symbols in a statistically in- 
dependent manner. 

3) The probability for ^(Gi+G2) to emit a particular 
symbol a G S, while being at state (qi,qi), (i.e. both 
components are at state qi), is given by the probability 
of generating a simultaneously (and independently) by 
both components; and the probability of this compound 
symbol (marginalized by the probability of generating 
identical symbols on both machines) is : 



II12 {(qi, qi 
ni(gi,cr)n 2 (3i,cr) 



,(ct,ct)) 

; Compound Event 
J2 a III (qi , c)ri2 (qi , a) < — Marginalization 

which matches exactly with Proposition [4] 
4) Since the internal states of ^ (G1+G2) are always of 
the form (qi, q^), it is straightforward to see that for any 
correct semantic compression algorithm, the structure 
of the identified PFSA matches with the component 
machines, Gi and G2. The proof is now complete. 



It follows from Proposition [6j that the Plus-Machine can be 
used to annihilate information in the symbol string generated 
by a PFSA in the following manner: 



G+H = H_i(i D ) =S> JZ{G+H) = H_i(i D ) 



(10) 



which implies that if G is the underlying PFSA for the sensed 
process, and we can compute H such that G + H = H_i(i ), 
and subsequently modify the incoming sensed data stream via 
the Plus-machine construction, we would end up with sym- 
bolic white noise in the output, which then can be identified 
easily. This, however, is not directly achievable in practice for 
the following reasons: 

1) Impossibility of state synchronization with sensed 
stream. 

2) Impossibility of disabling state transitions in the sensed 
physical process. 

The next section presents modifications to this basic con- 
struction to admit a physically realizable implementation of 
a semantic annihilator. 

V. Semantic Annihilation 

In this section, we assume that we are given a pre- 
identified (during the training phase) pattern library G = 
{G 1 : G 1 G containing a finite number of patterns of 

interest, represented as PFSA. We would construct a semantic 
annihilator for each pattern in G, which would be used in 
online classification. 

We need the following function that operates symbol-wise 
on streams, typically implementing a selective erasure of the 
two input streams (e is the null symbol, i.e., the identity in 
the concatenative free monoid over the alphabet £): 

Definition 19 (Erasing Function): The erasing function £ : 
S x S — >• S[J{e} is defined as follows: 



£(0-1,0-2) 



o\ if o\ AND (72 = 1 
e otherwise 



(11) 



A. Construction of the Semantic Annihilator 



Sensed 
Stream 



>0<h CKi 



Check for 
White Noise 



Semantic 
Annihilator 



Fig. 4: The block design for a semantic annihilator 

The component machines are set as follows: 

• Let the G G G be one element of the pattern library. 
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> Construct the additive inverse for G, i.e. compute H s.t. 

G+ff = Hl 1 (i ) (12) 

Let the state set for H be Qh, and let \Qh\ = m - 
• Create m copies for H, each initialized at a distinct state. 
Let H J be the copy of the PFSA H initialized at state 

B. Operational Description of the Annihilator 
The semantic annihilator operates as follows: 

1) Read symbol cr sensor from sensor 

2) Independently generate symbols er J for each component 
H j . 

3) Transition each W using the same symbol a sensor . 

4) Construct to symbol streams up ' £ S* : j € {1, • • • , to} 
recursively using the erasing function £: 

^UPDATE ~ ^ sensor 7 ® J ') (13) 

5) Check if any u)i is in fact symbolic white noise. 
Next we present the main result (Proposition [7]) which 

rigorously establishes the annihilation concept as a viable tool 
for pattern classification. 

Proposition 7 (Main Result): At least one of the con- 
structed streams w J will semantically compress to symbolic 
white noise if and only if G+H = H^ 1 (i ), i.e., 

G+W = H± x (i ) 3j (C(w>) - H_!(i )) (14) 

(See Notation [3] and note that W is a copy of H initialized 
at the f h state. ) 

Proof: (Left to Right:) Let the sensed process be gener- 
ated by the underlying PFSA G, such that G+H = H_i(i ). 
We note that, by construction, there exists such that £F* 
is always state synchronized with G. However, we only see 
symbols in the output stream w 3 *, if the generated symbols 
are identical. It follows that, on compression w J * would yield 
a modified PFSA (denote by G mod ) with structure identical 
to G, but each row of the H mod matrix would be modified as 
follows: 

j: a UG( qi ,a)(UG( qt ,a))-^K( qi ) |E| 

where K(qi) = So-O^ fei <7 )) _1 I s me normalizing constant, 
implying each row is identical and uniform which in turn 
implies that the identified model is symbolic white noise. 
(Right to Left:) We show this by contradiction as follows: 
Let the sensed process is generated by G such that 

G+H ^H_i(io) (15) 

and assume if possible, that there exists a constructed stream 
w J * which compresses to white noise. Although, we cannot 
assume that any W is state synchronized with G directly, we 
can consider the structure of both G and H to be represented 
(without loss of generality) by the one for G x H, in which 



they can be assumed to be synchronized (since state in 
G and in H can be mapped to state (qi,qk) in G x H). 
Denoting the machines modified by the synchronous product 
as G x and H x respectively, we note: 

G x +H X = H_x(i ) (By Assumption) (16) 

But since H(G X ) = H(G) and U{H X ) = U{H) (since the 
underlying measures are not modified by going to a non- 
minimal realization via synchronous product), it follows: 

H(G) +M(H) = i =► G+H = H_i(i ) (17) 

which contradicts Eq. (15[ . This completes the proof. ■ 
Our key motivation for developing the annihilator was to 
be able to classify PFSA-based patterns faster and in a more 
robust fashion in real-time or near-real-time field operation. 
The argument for robustness is pretty obvious, since one state 
models, especially with uniform generation probabilities of the 
symbols (i.e. white noise) are the easiest ones to identify 
reliably for any compression algorithm. The argument for 
fast identification is more involved, primarily due to the fact 
that the annihilators selectively erase symbols leading to a 
decrease in the lengths of the observed symbol strings. Thus, 
although we only need to check for white noise in the outputs 
(which is significantly faster compared to directly identifying 
the original pattern), the fact that now we are dealing with 
a shorter string, implies that there is the possibility that the 
increased speed of identification is offset by the slow down 
of the rate of symbol production at the outputs. In the next 
section, we investigate this issue in more details, and derive 
rigorous performance guarantees. 

VI. Performance Of Semantic Annihilators 

We need the notion of a stationary distribution on the states 
of a given PFSA. This is in fact the stationary distribution 
for the stochastic transition probability matrix that can be 
computed from the connectivity graph and the symbol gen- 
eration probabilities II. Also, as stated before, we assume that 
all PFSA considered in this paper are irreducible, i.e., have 
a strongly connected graph and hence yields an irreducible 
transition probability matrix. 

Definition 20 (Stationary Distribution): For a given PFSA 
G = (Q, E, 5,11), the stationary distribution p G e [0, 
2~2 i P? = 1 is defined as: 

1) Construct the transition probability matrix II as: 

Vq i>qj €Q, U\.. = (18) 

(T k :5(qi,o-)=qj 

2) Noting that IT is an irreducible stochastic matrix, com- 
pute the stationary distribution p G as the stationary 
probability distribution for the state transition matrix II, 
i.e.,jp G is the unique sum-normalized left eigenvector 
for IT satisfying p G IT = p G . 

It follows from the irreducibility assumption, that the sta- 
tionary distribution is unique for a given PFSA, and has no 
dependence on the initial state ll20l . 

Notation 4: In the sequel, we use the notation: p G = 



x 



Also, our assumption of irreducible models leads to the 
following property for the stationary distribution: 

Proposition 8: For any PFSA G = (Q,£,(5, II) with an 
irreducible underlying graph, p G > 0. 

Proof: Since II is irreducible for such G, no non-negative 
left eigenvector of IT has a zero coordinate [20|. ■ 
We want to estimate the shortening experienced by the sensed 
symbol strings due to the annihilation operation. We require 
the notion of the auxiliary PFSA A(G) for a given PFSA 
G, which captures the simultaneous operation of the two 
machines, without erasure of the non-matching symbols. 

Definition 21 (Auxiliary PFSA ): For a given PFSA G = 
(Q, £, 8, II), the auxiliary PFSA A(G) is defined as: A(G) = 
(Q, £ 1J 5 A , IT A ), where £' is a distinct isomorphic copy 
of S, with J" : £ — > £' being the (bijective) isomorphism, 
and: 

S(qi,cr) if a £ £ 



-l 



othcrwise 
if a 6 £ 



n I n(<7i, cr) — ygy^ otherwise 



(19a) 



(19b) 



where is the harmonic mean of the i th row of the II matrix 
for G. 

Proposition 9 (Properties of the Auxiliary Automaton): 
The auxiliary automaton A(G) = (Q, £ (j £',<5 A , n A ) has 
the following properties: 

1) p*(G) = p G 

2) If if is the annihilator component that is correctly state- 
synchronized with G (where G is the correct PFSA 
corresponding to the annihilator), then A(G) correctly 
tracks H (state-wise and symbol-wise), if we consider 
that all cr 6 £' are unobservable. 

Proof: (1) follows immediately from Definition [21] by 
noting that the probability transition matrix is left unaltered in 
the construction of A(G). For (2), we note that the transition 
structure for H (and hence G) is recovered if we map Ver 6 

(7 h-> <f~ x o. Next, we compute the probability p a bs{qi, cr) 
of an observable a when H is at state ^ as: 

Vct e E, 

Pobs (<?* , c) =n G (<ji , (j)U H (q.i , a) 

=IL G (q i ,a)(iL G (q i ,a))- 1 



E CT (n G fe,a))-i 



(20) 



It follows from above, that the probability of an unobservable 
a when H is at state qi is given by: 

Vcr £ £' ,Punobs{qi,cr) = II G (qi,a-) - |^|^ ( 21 ) 

which completes the proof. ■ 
Corollary 1: (To Proposition |9]l If the length of the symbol 
string generated by G is denoted by Lg, then the length L ann 
of the correctly annihilated string satisfies: 

IQI 



am — — 

L G -s.oo Lq 



(22) 



Proof: We first note that the stationary frequency distribu- 
tion of the symbols (over alphabet E) in a string generated 
by an arbitrary irreducible PFSA G ar b is given by: 



(23) 



where the independence from the initial state follows from the 
irreducibility of G ar b- It then follows from Proposition [9] that 
the frequency distribution for the auxiliary automaton A(G) 
is given by: 



t) 



SUE' 



p A(G)nA(G) = p GjjA(G) 



(24) 



which in turn implies (See Eq. ( 119b| )) that the probability A 
that any symbol generated by G is observable is given by: 



A= Y, p G n A(G) | 



(25) 



This completes the proof. ■ 
Next, we define the coefficient of annihilation advantage: 

Definition 22 (Coefficient of Annihilation Advantage): For 
a given PFSA G = (Q,E,cJ, fl), let L d be the string length 
required for direct identification via semantic compression, 
and let L w be the string length required for identifying 
symbolic white noise. Then the Coefficient of Annihilation 
Advantage (/3) is defined as the ratio: 



P = 



(26) 



Remark 3: It follows that when we have enough data to do a 
direct compression (of say length Ld), then the expected length 
of the correctly annihilated string is given by L^Y^i pf^i 
Since we are required to identify symbolic white noise at the 
annihilator output, and if the string length for identification 
of symbolic white noise is denoted by L w , then identification 
via semantic annihilation is advantageous if we have L w < 
£dEi Q ' P?M, i-e., if we have (3 < 1. 

In the sequel, we compute upper bounds on the Coefficient 
of Annihilation Advantage f3. In order to do so, it is obvious 
that we need to relate the lengths Ld and L w . However, we 
wish to achieve this without reference to any specific algorithm 
for semantic compression, i.e., we want the computed bounds 
to hold true irrespective of the manner we construct PFSA 
models out of symbol strings. We note that if we are to 
compress a string from a symbolic white noise, then we 
would expect to obtain a single state PFSA with equi-probable 
symbols. However, since we are talking about probabilistic 
generators, observing 1 symbol each from the alphabet would 
not be sufficient; or rather would be a very bad way of inferring 
that the symbol string is generated from the symbolic white 
noise. Since we assume that L w is the string length required 
for the identification (for the particular algorithm, whichever 
that may be), the number of symbols of each label that we need 
to observe would be at least -^L w . In the sequel, we assume 
that for an arbitrary PFSA, the number of symbols of each 
label that we need to observe at each state must also be of at 
least this value -^L w , since the chosen algorithm apparently 
requires this many observations for statistical inference. 



9 



Proposition 10 (Upper Bound for f3): For a given PFSA 
G = (Q, S, 5, II) with an irreducible underlying graph, which 
is not a realization of symbolic white noise, we have the 
following upper bounds: 

\n\Q\p? 



/3 



< 



T,, o G # ' 101 101 

i=i j=i 



2) 



< 



101 



Proof: We note for each state £ Q, we have: 



-ft 

(27a) 
(27b) 

(28) 



which follows from noting that p^Ld is at least the number 
of times state qi is visited, and hence p+Ld min crg £ H(qi, a) 
is at least the expected number of the least likely symbols 
generated at It follows: 



LdZ 



La 



1 



E|p* min CTeS n(%,cr 
o 



> 



Li i 



\Q\L d > 



P = 



< 



sip? 1 



|s||Q|p, g 



i=l 3=1 



(29) 
(30) 
(31) 



Note the strict bound in the second inequality in Eq. (f30l > 
follows from the fact that G is not a realization of symbolic 
white noise, implying 3qi e Q, Jfi > mirio-gx; n(gj, u), 
which completes the proof of Statement (1). For Statement 
(2), we first note that for any sequence of real numbers, 
the harmonic mean of the sequence is bounded above by its 
arithmetic mean. Hence, it follows that: 



IQI 



IQI 

E^s 



< 



IQI 

E^r 1 



IQI 



i 



< 



IQI IQI 

E^E^r 1 

3=1 3=1 
1 



< 

= mi2 



< 



l 



IQI IQI - IQI IQI " p G |Q| 2 

E^E^ 1 E^E^" 1 

3=1 i=l 3=1 3=1 

\X\\Q\p? < |E||Q|p G _ |E| 



IQI IQI 

E^E^ 1 

i=l 3=1 



IQI 2 P? 



where the last step follows from the fact that irreducibility of 
G guarantees p G > 0. This completes the proof. ■ 
Remark 4: Note that although we assume that G is not 
a realization of symbolic white noise, we could not assume 
3qi £ Q pf > p G , which would have made the bound in 



Statement (2) strict. The reason is that it is possible for a 
PFSA to have non-uniform symbol generation probabilities 
from some states, and yet end up having an uniform stationary 
distribution over its states. Note here that the property of being 
white (in the way we defined) has to do with the uniformity of 
the rows of the II matrix, and not the stationary probabilities. 

Remark 5: Proposition [10] is a strong result which implies 
that pattern classification via semantic annihilators is in fact 
advantageous for most PFSA encountered in practice, where 
typically one has a relatively small number of alphabet sym- 
bols and a possibly large number of machine states. 

Remark 6: The bounds computed in Proposition [10] are not 
tight. Specifically, note that we neglected the fact that for 
a general PFSA, the string length for identification could 
be significantly greater due to issues relating to adequately 
achieving statistical stationarity of the observed stream. Thus 
even for models for which |E| > \Q\, it is not automatic 
that identification via annihilation is slower compared to direct 
compression. 

VII. Summarized Algorithms for Classification 
Via Semantic Annihilation 

For each pattern in the specified pattern library, we first 
compute the inverse PFSA using Algorithm Q] Note, that step 
4 in Algorithm Q] is well-defined (and does not encounter 
a divide-by-zero overflow) on account of our assumption of 
the restricted set (See Definition fl5T l. Once the inverse 
patterns are computed, we need to set up the pattern-specific 
annihilators. Namely, for each pattern with \Q\ states, we need 
|<5| copies of the inverse, each initialized to a distinct state, as 
stated before. The annihilation process requires sequential gen- 
eration of symbols from these initialized PFSA, in accordance 
to their computed morph matrices. This is done as follows: 

1) Given the current state, we first select the corresponding 
row of the morph matrix, which specifies the probabil- 
ity distribution of the to-be-generated symbol over the 
alphabet. 

2) We generate a symbol in accordance to this distribution. 

There are standard reported ways of selecting an outcome 
in accordance to a specified distribution. We explicitly state 
one method involving a uniform random number generator 
with range [0, 1], which guarantees that the asymptotic time- 
complexity of this choice is 0(log 2 (|E|)) (See Algorithm O. 
The stated approach involves considering the cumulative dis- 
tribution for the symbol. Since this has to be done each time a 
symbol is generated, we compute the cumulative morph matrix 
II cum for the inverted models offline as follows: 

Definition 23 (Cumulative Morph Matrix): The cumulative 
morph matrix Tl cum is computed as follows: 



En 



(32) 



The sequential symbol generation then uses rows of the cu- 
mulative morph matrix instead, as the input v to Algorithm |2] 
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Algorithm 1: Computation of Inverse Pattern 

input : PFSA G= (Q,£,<5,LT) 
output: PFSA —G 
1 begin 



9 

10 



for i = 1 
for j 

n' 



|Q| do 
: 1 : |£| do 
1 



n, 



I J 

endfor 

for j = 1 : |£| do 

n' = — ¥^ 

endfor 
endfor 

-G = (Q,£,<5,n'); 



ii end 



Lemma 1: Assuming that uniform random numbers in the 
range [0, 1] can be generated in constant time, the asymptotic 
time-complexity of Algorithm [2] is 0(log 2 (|£Q). 

Proof: We note that the possible number of choices for 
the to-be-generated symbol reduces by half its previous value 
in each iteration, implying that the number of iterations / 
satisfies: 

2^|£|=M^log 2 (|£|) 

which completes the proof. ■ 
Each copy of the inverted model in the annihilator accesses 
the sensed symbol, generates its own symbol in accordance 
to its current state, reports the symbol if there is a match, 
and finally updates the current state using the sensed symbol. 
The sequence of moves for each component (or copy) is 
enumerated in Algorithm [3] The \Q\ reported streams are 
individually compressed to check if any is in fact white noise. 

VIII. Asymptotic Complexity Analysis 

We ascertain the asymptotic time complexity per sensed 
symbol of the online portion of the annihilation process, 
assuming the pattern corresponding to the annihilator is indeed 
present in the sensed stream. This analysis is important since 
the annihilator is processing a multi-stream input, and we need 
to convince ourselves that the work required per observed 
symbol is not too great, particularly since an overtly complex 
algorithm will be unable to handle high data rates. 

We assume, as before, that random keys can be generated 
in constant time. Then, we have the following result: 

Proposition 11: For a given PFSA G = (Q,£,(5, IT), the 
asymptotic time-complexity s$g of classification via annihila- 
tion, per sensed symbol, is bounded as: 



^ G = 0(log 2 (|£|)) 



(33) 



Proof: Time-complexity of identification C\, considering 
all |Q| components of the annihilator, satisfies: 



Algorithm 2: Probabilistic Symbol Generation 
input : Non-negative vector v of length |£| such that 
v\i 5^ HlEI = 1> Alphabet 

£ = Wx,--- ,C-|£|} 
output: Generated Symbol a 
l begin 

2 
3 
4 
5 
6 



7 
8 
9 

10 
11 

12 
13 

14 
15 
16 
17 

18 end 



Generate random key K r £ [0,1] ; 
lowerB = 1; 
upperB = |£| ; 

while upperB > (lowerB + 1) do 

M= r °PP»B-lowerB -|. ^ Rounding to Next 

Integer */ 
if K r ^ u\m then 
[ upperB = M; 
else 

j lowerB = M; 
endif 
endw 

if k r S v\i owerB then 



else 

I a : 
endif 



^lowerB \ 



^upperB j 



Algorithm 3: Componentwise Annihilation Operation 

input : Cumulative Morph Matrix II cum , Initial state 

<3init, Transition function S 
output: Reported symbol stream 
i begin 

Set current state q curr = qi„i t ; 
/* Infinite loop 
while true do 

Observe sensed symbol er se nsed I 
Generate random symbol <T gen using row of LT cum 
corresponding to <7 curr ; /* Algorithm \2\ */ 
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*/ 



if cr. 



sensed 



then 



Report (T gen ; /* Annihilated Stream 

*/ 
endif 

Update current state q mrr = <%curr, ^sensed); 
endw 



ii end 



Ci ^ T R \Q\0(\og 2 (\J:\))L w Co 



(34) 



where Tr is the complexity of generating random keys in the 
range [0,1], L w is the string length required for identifying 
the symbolic white noise, and Co is the time complexity of 
identifying symbolic white noise (using some given direct 
compression algorithm, and assuming we check for white 
noise on each stream after each sensed symbol observation). 
Hence, assuming that we have the total sensed string length 
as Ld, it follows that the time complexity per sensed symbol 
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I 0.1 




(b) SI 



Fig. 5: PFSA models used for simulation, real numbers are 
symbol probabilities, integers are symbol labels: (a) M2 is 
a subshift of finite type having the structure of a suffix 
automaton, (b)Sl is a generalized even shift process, is a 
strictly Sofic system and has no synchronizing string 



is bounded by: 

^ T fl C |Q|O(log 2 (|S|))^ (35a) 
Lid 

Using Definition [221 we have: 

101 

*fa ^T fl C |Q|O(log 2 (|E|))/3^pf^ (35b) 

i 

Using Proposition [10] and noting ^ ^ ji-, we have: 

^ T fl C |Q|O(log 2 (|E|))j!j pf ]4 (35c) 
=**fc ^T fl C O(log 2 (|S|)) (35d) 
Neglecting constant time factors, we have: 

st G = 0(log 2 (|E|)) (35e) 

Note that Eqn. (I35ab is exact and not an averaging, since we do 
the same work every time a symbol is sensed. This completes 
the proof. ■ 
This is a strong result showing that the asymptotic time- 
complexity of classification via annihilation, per symbol, is 
independent of complexity of the pattern and the number of 
PFSA states, and is only mildly dependent on the cardinality 
of the alphabet. Again, since the alphabet sizes are relatively 
small, and recalling that the proposed technique is provably 
faster compared to direct compression for most models, it 
follows that classification via annihilation is indeed highly 
advantageous for online operation. 



IX. Verification & Validation 

In this section, we validate the preceding theoretical devel- 
opments in simulation. The PFSA models selected for gen- 
erating the simulated symbol string is illustrated in Figure [5] 
The model (M2) shown in Figure |5ja) has the structure of a 
suffix automaton (TJ. PFSA which have such structures are 
easier to identify from symbolic strings; primarily due to the 
existence of synchronizing strings [8] (strings which lead to a 
particular state irrespective of the starting state). For example, 
in the model M2, the states q\ , q2 , <?3 , 94 can be easily seen 
to represent sets of symbol strings ending in 00,01,10,11 
respectively ll22l . Q). Although, the state structure is not 
available a priori to the compression algorithm, nevertheless, 
such so-called d-Markov machines 1221 are significantly easier 
to identify. For examples of physical situations in anomaly 
detection which give rise to, or are effectively modeled by such 
rf-Markov machines, the reader is referred to [23], [3]. The 
second model (SI) (Figure|3b)) has only two states. However, 
SI represents a generalization of the even-shift process, and 
its underlying graph is an example of a strictly Sofic shift 
process (and not a sub-shift of finite type [24]). Specifically, 
SI does not have any synchronizing strings, i.e., without the 
knowledge of the initial state one cannot infer the current state 
in a deterministic sense even from arbitrary long observation 
strings. Such models are significantly more difficult to identify 
(See [8 1 for discussion) for any of the compression algorithms 
reported in the literature (TJ, ll22ll . 

The algorithm used for direct compression is a modified 
version of CSSR 10, Q. We compare PFSA models using 
the metric 9 proposed in |9), which is capable of computing 
distances between PFSA models with different underlying 
graphs (with identical alphabets). Note that while the output 
of the direct compression algorithm is compared against the 
original model, the annihilator output is compared against 
symbolic white noise. 

In the two simulation runs reported, we generate data 
from the models and compare the string lengths required 
by direct compression versus classification via annihilation. 
The annihilators were constructed from the knowledge of the 
particular model used in the simulation using the formulation 
presented in Section [V-AI Note, that the annihilation technique 
is not meant for identification of an unknown pattern (i.e. 
pattern identification), but detecting if the sensed symbol string 
is actually being generated by a known library pattern (i.e. 
pattern classification). Figure |6ja) illustrates the results for 
M2, and we note that the annihilator is significantly faster. 
The principal advantage of using annihilators is better illus- 
trated for SI, where, for reasons explained above, the direct 
compression algorithm has a hard time, and has failed to cor- 
rectly identify SI even after 7000 symbols (convergence was 
observed at around 10000 symbols). The annihilator identifies 
SI at just over 1000 symbols as shown in Figure |6|b). Finally, 
Figure [(Jc) compares the response of a annihilator which does 
not correspond to the process generating the observed symbols, 
with one that does. Note, that in both cases, the responses are 
very stable; with the incorrect annihilator converging to 0.3 
and the correct one to (very nearly) 0, which reflects a match. 
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2500 



Symbol Ticks ' Symbol Ticks 

(a) J (b) 



Fig. 6: (a) Simulation result for model M2: Note that direct compression converges at 3500 symbols, while the annihilator 
converges at just over 1000 symbols. A few intermediate models identified by the direct compression algorithm is also shown, 
(b) Simulation result for model SI: Note that direct compression has failed to converge at 7000 symbols; (c) Comparison of 
distance of annihilated strings from symbolic white noise for correct and incorrect annihilators, i.e., where we have a pattern 
match and where we do not. 



In general, evaluation of such a pattern match involves using 
a specified detection threshold, the implications of which are 
discussed in the next section. 

Implication of Proposition [10] is illustrated explicitly in 
Figure [7a] (a snapshot of the annihilation process in the above 
described simulation runs), where we note that the annihilation 
process essentially erases symbols selectively in the incoming 
data stream, and hence yields a significantly shorter observed 
sequence. Although, we now have the relatively easier task of 
identifying if this annihilated sequence is indeed white; but 
even such an identification cannot be effectively done with 
too few symbols. Proposition [10] guarantees that the length 
shortening cannot offset this advantage in practical scenarios, 
where we are most likely to have more states than the total 
number of symbols in the alphabet. 



X. Intuitive Interpretation & Potential 
Applications 

An important intuitive insight on why the annihilators are 
able to classify the streams faster can be given as follows: by 
avoiding direct compression of the observed symbol sequence, 
we are essentially solving a classification problem, which is, in 
general, easier compared to a full blown identication problem, 
involving discovery of new patterns. Direct compression is 
capable of telling us not only if there is a match, but also yields 
the new PFSA model of the observed sequence when there 
is no match with the existing templates. Annihilation only 
indicates the matching template if there is one, and indicates 
a "no match" otherwise. Thus, the increased efficiency is not 
surprising. This is particularly useful for templates that have 
no synchronizing strings (such as the model 51), where for 
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(a) 




(b) 

Fig. 7: (a) Snapshot of the annihilation process (b) Pattern classification scheme in symbolized sensory data streams 



direct compression, one needs to distinguish between the states 
using long observation sequences, that disambiguate possible 
future evolutions based on the small deviations in the observed 
probability distributions on the future strings. If the spectral 
gap of the corresponding Markov chain is small, the sequences 
required can turn out to be unacceptably long (since smaller 
the spectral gap, longer is the mixing time). This is what we 
see manifested in Figure [2b] where direct compression has a 
hard time. For the annihilation, such complexities are absent: 
the spectral gap does play a role in the degree of shortening of 
the annihilated sequence (See Proposition |T0T >, but one always 
looks for symbolic white noise at the annihilator output, 
irrespective of the complexity of the template. 

The idea of pattern classification via controlled information 
erasure may seem somewhat counter-intuitive at the first 
reading. However, the key notion exploited here has a clear 
analogue in communication theory, particularly in the theory of 
matched filters |25|. A matched filter is a theoretical construct 
(and not the name of a specific filter family) which processes a 
received signal to minimize the effect of noise, i.e. maximizes 
the signal to noise ratio (SNR), and simultaneously minimizes 
the probability of bit error rate (BER). It can be shown 
that, under the assumption of additive white Gaussian noise 



(AWGN) in the communication channels, an optimum filter 
for receiver-end demodulation exists, and is a function only of 
the transmitted pulse shape. Because of this direct relationship 
to the transmitted pulse, it is called a matched filter. The 
derivation of a classical matched filter is essentially based on 
a direct application of Schwartz inequality [21 1, and leads to 
a very simple and remarkable conclusion: 

For AWGN channels, the signal to noise ratio is 
maximized when the impulse response of that filter 
is exactly a reversed and time delayed copy of the 
transmitted signal. 

Since the bit error rate experienced by a signal during demodu- 
lation is a function of the signal to noise ratio 11261 . a matched 
filter which maximizes SNR will automatically provide the 
lowest possible BER. The analogy of semantic annihilation 
with matched filters is compelling: instead of using a time- 
reversed copy of the signal template, we are using the symbol 
stream generated by an inverse probabilistic automata. Just as 
a matched filter functions by convolving the signal with its 
reversed and delayed copy, the annihilator carries out symbol- 
wise comparisons between the given symbol stream, and the 
state-specific ones generated by the inverted template; erasing 
symbols that do not match. The fact that we can carry out this 
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procedure in a deterministic fashion should not be surprising: 
the convolution in the case of matched filters is generally car- 
ried out using Fourier transforms (FFT), which is also a rather 
straightforward deterministic operation. In the latter case, the 
filtered signal must still be recognized, but this decision- 
making task is now significantly easier due to the filtration- 
enhanced SNR. In our case, the annihilator does not output an 
enhanced signal, but reduces it to white noise if the correct 
template is used. However, the task of recognizing symbolic 
white noise is significantly easier compared to recognizing the 
template pattern directly; thus reinforcing the analogy. The 
recognition of symbolic white noise does involve the use of 
a detection threshold, since in practical scenarios, we do not 
expect the signal and the template to match exactly, given 
finite-length observation sequences. Thus, when the distance 
between at least one of the PFSA models computed from the 
annihilator output falls within a pre-specified distance to the 
white noise model (in the sense of the PFSA metric 9 [9|), we 
conclude a positive match. Using arbitrarily small thresholds 
may require long data streams, and most likely will result 
in negative matches due to small noise-mediated mismatch 
between the streams. 

The key application that the authors have in mind is 
pattern classification in symbolized (or quantized) sensory 
data streams. This particular approach of pattern detection 
in sensory data has been shown to be significantly more 
efficient to classical continuous domain techniques, exhibiting 
remarkable insensitivity to spurious noise and exogenous dis- 
turbances; primarily due to the quantization-mediated coarse- 
graining, and as a consequence of repeated recurrences of 
paths in the graph of the finite state machine with relatively 
few states and a large number of sample points in the (fast 
scale) time series data (22]. Recent applications of such PFSA- 
based pattern classification has been effectively applied to 
anomaly detection problems in complex electro-mechanical 
machines [23 1, and tracking targets via large-scale multi-modal 
urban sensor networks [27|. The basic philosophy is illus- 
trated in Figure [7b] Continuous valued data from sensor(s) is 
quantized via an appropriately chosen partitioning scheme [3 | 
to yield a symbolic sequence over a pre-specified alphabet 
(depending on the coarseness of the chosen partition). In the 
absence of annihilators, one is then required to algorithmically 
compress a sufficiently long symbolic sequence to extract 
the underlying causal generative model in the form of a 
probabilistic finite automata. The classifier is provided with 
a template library consisting of PFSA models that encode the 
pertinent patterns of interest. Once the observed sequence is 
compressed to a PFSA, this can then be compared against 
the individual library elements to compute a possible match. 
The compression algorithms, however, are often expensive; 
particularly if the underlying PFSA is not a subshift of finite 
type [24|. Annihilation offers a significantly simple solution, 
which skips the compression step altogether. The observed 
stream can be symbol-wise annihilated using the inverted 
templates in the library, requiring less data, and significantly 
simpler implementations. 

A second promising application is the design of PFSA-based 
novel modulation-demodulation schemes for communication 



over noisy channels. In this paper, we considered the special 
case where the symbol stream generated by a PFSA G is 
annihilated by the inverse model — G. However, in general, one 
can apply similar ideas to encode a stream from PFSA G using 
an encoding PFSA G e as G+G e , and demodulate by "adding" 
the inverse stream G+G e +(—G e ) — G. Such avenues will be 
explored in future, where careful choice of the encoding PFSA 
may lead to greater resilience to noise corruption, or even to 
unauthorized message access. 

XI. Summary, Conclusions & Future Work 

We defined an additive abelian group for probability mea- 
sures on symbolic strings, which induces an abelian group on 
a slightly restricted set of PFSA. The defined PFSA sum is 
then used to formulate semantic annihilators, which identify 
pre-specified patterns of interest via perfect removal of all 
inter-symbol correlations from observed strings, turning them 
to symbolic white noise. This approach of classification via 
annihilation is shown to be advantageous, with theoretical 
guarantees, for a large class PFSA models. The results are 
supported by simulation experiments. 

Future work will extend the formulation to models where 
not all symbols satisfy the condition that the generation proba- 
bilities are strictly non-zero from each model state. The effect 
of noise corruption on observed strings need to be investigated, 
with particular emphasis on the comparative effect of noisy 
observations on direct compression and semantic annihilation. 
Furthermore, implementation in actual experimental scenarios 
will further validate the proposed classification technique. 
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